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In September 2015, the US government released its College Scorecard, which includes data on college 
costs, student debt loads, graduation rates, and postgraduate earnings for over 7,000 colleges and 
universities. Most private college ratings—such as those published by US News and World Report— rely 
primarily on inputs, such as class size, financial resources, and student selectivity. The College Scorecard 
provides, for the first time, information on downstream outcomes, such as earnings, and does so in a 
user-friendly format that allows for comparisons of benefits and costs across institutions. This 
represents a major advance in higher education accountability because colleges can be judged by the 
value they provide students, rather than by the size of their endowments or their ability to attract the 
most talented applicants. 

Still, it is worth asking why we need the College Scorecard at all. One reason is that measurement 
drives institutional incentives and performance in both desirable and undesirable ways. This is known 
colloquially as Campbell's law (2011): “The more any quantitative social science indicator is used for 
social decision-making, the more subject it will be to corruption pressures and the more apt it will be to 
distort and corrupt the social processes it is intended to monitor.” Because consumers respond to 
signals about quality, institutions may respond strategically to performance measurement even when 
there are no explicit "stakes” attached. 1 Absent publicly provided data on student outcomes, private 
college rating systems such as US News will focus on what they can measure, and colleges will respond 
accordingly. 

Colleges care immensely about their reputations, as anyone who has worked in higher education 
leadership can attest. Many colleges admit students strategically in response to private rating systems 
such as US News, and movement within the ranks affects applicant behavior and perceived institutional 
quality. 2 Notably, private ratings invoke responses without any explicit rewards and sanctions. Thus, the 
College Scorecard—because of its focus on earnings and completion rates rather than on college 
reputation and resources—has the potential to better align institutional responses with student 
outcomes. If policymakers want to take the next step and create performance measures using data like 
the College Scorecard, the challenge is to design them so that pressure to improve measured 
performance improves actual performance and helps students achieve their goals. 
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Accountability in US K-12 Education 

Accountability has been around longer in US K-12 education, and its history provides valuable lessons 
(and cautionary tales) for higher education. The K-12 accountability movement culminated with the 
passage of the No Child Left Behind Act of 2001, which required all states to test students regularly in 
core subjects and to evaluate schools based on whether their students were making adequate progress 
toward achievement benchmarks, with the goal of 100 percent proficiency by 2014. Perhaps 
unsurprisingly, it became clear shortly after the passage of No Child Left Behind that states would fail to 
attain the lofty goal of 100 percent proficiency. In December 2015, No Child Left Behind was replaced 
by the Every Student Succeeds Act, which scales back testing requirements and returns more 
implementation power to the states. 

Researchers have consistently found evidence of strategic responses to K-12 accountability, such 
as narrowing the curriculum at the expense of nontested groups and subjects, focusing on "bubble” 
students, and “teaching to the test.” 3 Narrowing the curriculum is not necessarily a bad thing. Yet some 
strategic responses to accountability are harder to rationalize. Research shows that schools increase 
the calorie content of meals on test days in an apparent attempt to artificially inflate test scores (Figlio 
and Winicki 2005). Other more insidious responses to accountability pressure include strategic 
reclassification of students into disability categories, suspending low-performing students from school 
when the tests are given, and teacher cheating. 4 

While schools respond strategically to accountability pressure, they make substantive changes, 
such as lengthening instructional time and increasing focus on low-performing students who need extra 
help (Deming et al. 2016; Reback, Rockoff, and Schwartz 2014; Rouse et al. 2013). Several studies find 
that accountability pressure to increase performance on high-stakes tests led to gains on low-stakes 
assessments in similar subjects. 5 Studies of accountability consistently find larger gains for 
disadvantaged students and for schools that are close to receiving a "failing” grade. 6 

Broadly, there are four lessons we can learn from studying accountability in K-12 education. 

1. Schools will respond strongly to the chosen performance metrics, often at the expense of other 
dimensions of performance. 

2. Design details strongly influence behavior. For example, base-rate targets, such as percent 
proficient, lead schools to focus on bubble students, who are close to the margin of passing, at 
the expense of low-scoring and high-scoring students. 7 

3. Complexity makes the system less useful for consumers and increases the scope for strategic 
responses. The more metrics, student groups, and exceptions, the more easily schools can 
"game the system” by leaning on its weak points. 

4. Accountability works best for low-performing institutions, possibly because these institutions 
do not face strong internal pressure to improve from students or school leadership or because 
they do not face market pressure to maintain enrollment. 
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Implications for Higher Education Accountability 

The Case for Information without Explicit Ratings 

Compared with their K-12 counterparts, US colleges and universities have broader purposes and serve 
a greater variety of students. This increases the difficulty of designing an effective accountability 
system because the benchmarks are harder to agree upon and the scope for strategic responses is 
greater. 

While K-12 schools are mostly required to take all comers, postsecondary institutions choose 
which students to admit, and even open-enrollment institutions engage in subtle forms of selection. 
Public K-12 institutions have clear and well-defined missions and offer a standard curriculum. In 
contrast, higher education institutions decide which programs to offer and differ greatly in their stated 
institutional missions. Colleges and universities also operate in different markets, ranging from open- 
access community colleges with a mandate to serve the local economy to elite institutions that compete 
for the best students on a global scale. 

Trying to rate colleges and universities on a few common standards may not make sense. For 
example, the College Scorecard lists both Boston University and the New England Conservatory of 
Music as having an average annual cost (i.e., the average price net of all financial aid) of around $35,000. 
Yet the average salary for Boston University graduates 10 years later is more than double ($60,600 
versus $29,500). Are we comfortable rating colleges according to a financial benefit-cost calculation 
that will penalize students who self-select into lower-earning fields of study? 

One option is to stick to a limited form of “report card” accountability by eschewing ratings and 
rankings of colleges and universities, but instead providing transparent and easily digestible 
performance information and allowing consumers to use it as they see fit. This is the College Scorecard 
approach, and it has much appeal. The federal government has a unique and irreplaceable role as 
provider of standardized, high-quality outcome data in all areas from crime to labor markets to health 
care. Federal data are especially important in cases such as education, where information problems 
abound. Federal provision of timely, standardized data will make the free market work better than it 
would absent these data. 

But even if we decide that this form of report card accountability for higher education is sufficient, 
we must think about what information to provide and in what way. 

Targets and Trade-Offs in a College Rating System 

What would be the impact of using the variables in the College Scorecard to create a high-stakes system 
of college ratings? Suppose that ratings were based on metrics such as graduation rates, postgraduate 
earnings, loan default rates, and average debt burdens. Although these variables can capture variation 
in institution quality, they likely reflect differences in the preexisting characteristics of admitted 
students. 
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Institutions that serve primarily disadvantaged or first-generation college students will—all else 
equal—have lower graduation rates and higher borrowing rates. Colleges with missions to educate 
larger numbers of first-generation and disadvantaged students will look less attractive by these criteria. 
Thus, one concern with high-stakes ratings is that colleges might alter admission criteria to favor 
particular students, such as students with better family financial situations. 

In principle, one can "risk adjust” performance standards to reflect preexisting differences in the 
likelihood of student success. This can reduce some aspects of selection, but as with "value added" 
approaches in K-12 education, risk adjustment can increase measurement error and reduce 
transparency and public confidence in the rating system. In higher education, there are large differences 
in students’ prior preparation even within open-access institutions (Kurlaender, Carrell, and Jackson 
2016). Failure to risk adjust appropriately can be problematic. 

Moreover, the data reported in the College Scorecard are calculated based on different populations 
for each outcome. For instance, while college graduation rates are calculated for all students, average 
costs of attendance and subsequent earnings are calculated only among federal financial aid recipients. 
This means that outcome data are missing for some students and not others, making risk adjustment 
important for apples-to-apples comparisons across institutions. 

An important complication in evaluating colleges with employment and earnings outcomes involves 
comparisons across field of study. Four-year college graduates with the highest-paying majors earn two 
and a half times on average what four-year college graduates with the lowest-paying majors earn 
(Hershbein and Kearney 2014). Majors that prepare students to work with children (e.g., early 
childhood education and elementary education) or provide community and counseling services (e.g., 
family sciences, social work, and theology) have the lowest average earnings. Evaluating institutions on 
one dimension, such as earnings, could lead to reductions in opportunities to prepare for socially 
desirable but not financially lucrative fields. 

Still, some of the adjustments and trade-offs from greater accountability in higher education may be 
welcome. After all, if students who are undecided about majors get a nudge toward a choice that pays 
better, or if schools put more emphasis on a high graduation rate and a lower debt burden, such steps 
may overall be beneficial. 

Any high-stakes system of college ratings will be a balance between the two competing objectives 
of access and institutional performance. Performance metrics create strong incentives for colleges to 
cream-skim the most prepared students. Because it is impossible to measure everything important to 
know about a college’s applicants, no risk adjustment will fully remediate these incentives. This is an 
especially important issue in thinking about accountability for open-access institutions—such as 
community colleges—that enroll many low-income and first-generation students. The challenge is to 
create incentives for better performance while minimizing harmful strategic responses. 
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Design Principles for Higher Education Accountability 

A first design principle is that a college rating system should be kept as simple as possible. Simple, 
transparent performance measures facilitate consumer choice and reduce the scope for strategic 
responses. There is an important trade-off between simple metrics such as graduation rates and 
complicated value-added models that try to risk adjust for student characteristics. Although risk 
adjustment may improve accuracy, a downside of the added complexity is that students and families are 
less likely to respond to performance metrics that they do not fully understand. On the other hand, 
colleges have strong incentives to understand the details of any rating system—no matter how 
complex—and find its weak points. One compromise approach to risk adjustment is to measure relative 
performance within "equivalence classes” of postsecondary institutions that represent likely choice sets 
for certain students. Equivalence classes could be based on geographical proximity and measures of 
selectivity, or they could be constructed empirically using overlap in students’ actual choice sets (Avery 
et al. 2013). 

A second design principle is to target the postsecondary institutions that are least likely to respond 
to market forces absent accountability. Elite colleges already compete fiercely for students, many of 
whom voluntarily pay large sums of money to attend. A government rating is unlikely to change 
behavior in this competitive environment. (To be clear, this does not mean we believe all students at 
elite institutions are getting a great and cost-effective education, only that additional accountability 
measures will likely have little impact on institutional behavior). In contrast, both less-selective local 
public institutions and for-profit colleges are heavily dependent on public subsidies, the former from 
taxpayer-funded state and local appropriations and the latter from federal Title IV financial aid. 
Dependence on taxpayer largesse justifies tighter regulation because many of these colleges could not 
stay in business without government support. 

But, keeping in mind that a complex rating system creates strong incentives to cream-skim the best 
applicants, accountability systems must prioritize access for low-income and first-generation college 
students. One idea is to focus on certifying a minimum standard of quality, rather than assigning grades 
or ratings to institutions all along the spectrum. Similar to health inspections or the consumer drug 
approval process, the job of a higher education accountability system could be to certify that schools are 
good enough to receive public support. 

Public certification of postsecondary institutions already exists in the form of accreditation, yet the 
current system has proven ineffectual at regulating bad actors. 8 A possible hybrid approach would 
involve designing an inspectorate system that is “turned on” when an institution falls below quantitative 
benchmarks. Although school inspections are resource intensive, targeting toward the lowest 
performers would limit the cost of such a program. 

The federal Gainful Employment regulations that went into effect in 2015 are one—albeit 
imperfect—example of regulating a minimum quality standard. The Gainful Employment regulations 
specify that graduates of nearly all for-profit programs (along with certificate programs at not-for-profit 
and public institutions) must have an annual loan payment that does not exceed 20 percent of 
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discretionary income or 8 percent of total earnings. The penalty for repeatedly exceeding this debt-to- 
earnings threshold is the withdrawal of eligibility for that institution to disburse federal Title IV financial 
aid. The design of Gainful Employment is simple and straightforward, and the regulation concentrates 
on the worst offenders. But a legitimate criticism is that it unfairly targets degree programs in the for- 
profit sector and leaves poorly performing public institutions untouched. 9 

A third design principle is that accountability should ensure that both students and postsecondary 
institutions have some “skin in the game.” Concretely, one could design an accountability system where 
oversight and regulatory control is increasing in the share of institutional revenue that comes from 
public sources. 10 Colleges that can attract full-paying customers—either out-of-state students or 
students who do not qualify for financial aid—have implicitly survived a market test and should be 
allowed to operate more freely. This does not mean that public institutions cannot be heavily 
subsidized, but it does call for greater scrutiny when taxpayers are footing more of the bill. 

A more direct approach is risk sharing, where institutions would pay for some share of student loans 
that subsequently end up in default. See Chou, Looney, and Watson (2017) for a proposal along these 
lines. Possible strategic responses include cream-skimming on students’ ability to pay and 
underproviding programs with high social value but low downstream income potential. But, given the 
high rate of student loan default and rising student indebtedness, this may be a worthwhile trade-off 
(Looney and Yannelis 2015). 

Conclusion 

The rationale for increased accountability in the higher education sector is clear. But designing a well¬ 
functioning accountability system is difficult. The experience from accountability in K-12 education, 
higher education, and other settings demonstrates that “what gets measured gets done” in both socially 
desirable and undesirable ways. 

Our main recommendations are as follows: 

■ Policymakers should consider report card accountability—that is, providing timely, transparent, 
high-quality information about institutional performance but without any explicit ratings. 

If we construct a college rating system, it should 

■ be as simple as possible to reduce the scope for strategic responses, 

■ focus on the lowest-performing and least competitive colleges and markets, and 

■ be designed so that both students and institutions have some skin in the game. 

If performance measures work, they will provoke a mix of real improvement and strategic 
responses. This may be the time for state-level policy experimentation. If different states try different 
forms of higher education accountability, we will learn more about which approaches yield the greatest 
benefits. 
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Notes 


1. In K-12 education, we observe that government ratings of schools without official stakes led to changes in 
residential real estate prices (see Figlio and Lucas 2004 and subsequent papers) and private contributions to 
schools (Figlio and Kenny 2009). When members of the general public provide or withdraw support from 
schools, or when their asset values depend on measures of school quality, they exert pressure on schools to 
improve along the measured dimensions. 

2. See Bastedo and Bowman (2010), Espeland and Sauder (2007), Luca and Smith (2013), Meredith (2004), and 
Monks and Ehrenberg (1999). 

3. See Booher-Jennings (2005), Diamond (2007), Diamond and Spillane (2004), Fiamilton, Berends, and Stecher 
(2005), Ladd and Lauen (2010), Neal and Schanzenbach (2010), Ozek (2012), Reback (2008), Reback, Rockoff, 
and Schwartz (2014), and Stecher and coauthors (2000). 

4. See Cullen and Reback (2006), Deere and Strayer (2001), Deming and coauthors (2016), Figlio (2006), Figlio 
and Getzler (2006), and Jacob and Levitt (2003). 

5. See Chiang (2009), Dee and Jacob (2011), Figlio and Rouse (2006), Greene, Winters, and Forster (2004), Jacob 
(2005), Ladd (1999), and Rockoff and Turner (2010). 

6. See Allen and Burgess (2012), Carnoy and Loeb (2002), Chiang (2009), Dee and Jacob (2011), Deming and 
coauthors (2016), Figlio and Rouse (2006), Lauen and Gaddis (2012), Reback, Rockoff, and Schwartz (2014), 
and Rouse and coauthors (2013). 

7. Figlio and Ladd (2015), Figlio and Loeb (2011), Neal (2010), Neal and Schanzenbach (2010), and Ozek (2012). 

8. The US Department of Education keeps a list of regional and national accreditors, and in principle, institutions 
must be approved by an accreditor’s regular inspections to distribute federal financial aid. Yet in practice, 
accreditors—who are paid by the institutions themselves—appear to be ineffectual at best, much like the role 
of credit rating agencies during the recent financial crisis. As a case in point, the Accrediting Council for 
Independent Colleges and Schools has come under scrutiny for continuing to accredit branches of Corinthian 
Colleges up until Corinthian collapsed in 2015 amid allegations of fraud and financial misconduct. 

9. The Gainful Employment program focuses only on for-profits and certificate programs in nonprofit and public 
institutions. This targeting was partly a regulatory necessity (the phrase "gainful employment” originates from 
language in the Higher Education Act of 1965 that specifies which institutions are allowed to distribute Title IV 
aid) but was also deliberately aimed at the for-profit sector. 

10. A federal regulation known as the 90-10 rule prohibits for-profit colleges from deriving more than 90 percent 
of revenue from Title IV aid. The largest for-profit colleges bump up against this 90 percent cap. This 
dependence on taxpayer largesse, more than for-profit status, justifies tighter regulation. 
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